NOISE REDUCTION METHOD 

BACKGROUND OF THE INVENTION 
1 .Field of the Invention 

The present invention relates to a noise reduction method and, more 
5 particularly, to a method using spectral subtraction to reduce noise. 
2. Description of Related Art 

The spectral subtraction method has been proven effective in 
Q enhancing speech degraded by additive noise. It is simple to implement, 
f\ hence is suitable as the pre-processing scheme for speech coding and 
00 recognition applications. This method subtracts the noise spectrum 
estimate from the noisy speech spectrum to estimate the speech magnitude 
ft] spectrum, so as to obtain the clean speech signals. 

H FIG. 1 shows the flowchart of the aforementioned spectral subtraction 

W method, wherein the input noisy speech is divided into a plurality of 
15 continuous frames, and each frame is represented by an additive noise 
model : 

y r (k) = s r (k) + w r (k), 
where y r (k), s r (k) and w r (k) denote respectively the k-th noisy speech, clean 
speech, and noise sample of the r-th frame. Taking the fast Fourier transform of 
20 the noisy speech frame y r (k) (step S 101), the noisy speech spectrum of the r-th 
frame at the k-th frequency component is obtained and denoted as |Y r (k)| 2 . In 
addition, the noisy speech y r (k) is also applied in a silence detection process 
(step SI 02) and a noise spectrum estimation process (step SI 03) to 
estimate a noise spectrum, denoted as |W r (k)| 2 . After performing a 
25 spectral subtraction process (step SI 04), the energy spectrum of clean 



speech is obtained as follows: 

|s r (k)| 2 =|Y r (k)| 2 -|W r (k)| 2 . (1) 

If the phase spectrum of the clean speech can be approximated by the 
phase spectrum of the noisy speech, the estimate of clean speech s r (k) can 

5 be obtained by taking the inverse fast Fourier transform of |S r (k)| . 

€3 Such a method is suitable as the pre-processing scheme for speech 

f 

J] coding and recognition applications because it is easy, effective and simple 
jN to implement. However, the noise spectrum estimate may cause a relatively 
large spectral excursion in the spectrum estimate of clean speech. This 
SjlO spectral excursion will be perceived as time varying tones contributing to 

N the so-called musical noise. 

a 

Hi To reduce the musical noise Berouti et al proposed a noise reduction 

method to over-subtract the noise spectrum estimate, and a description of such 
can be found in M. Berouti, R. Schwartz, and J. Makhoul "Enhancement of 
15 speech corrupted by acoustic noise", pp.208-211, 1979 IEEE, which is 
incorporated herein for reference, wherein the formula (1) is modified as: 

|s r (k)| 2 =|Y r (k)| 2 -a r -|W r (k)| 2 . a r ^l, (2) 

so as to decrease the influence caused by the excursion of the noise spectrum 
estimate and thus reduce the effect of musical noise. In the method, the 
20 over-subtraction factor ct r was determined by the signal-to-noise ratio (SNR) of 
the processing frame, and can be expressed by formula: 



2 



where a 0 is pre-selected over-subtraction factor when SNR =0, SM^ is 
pre-selected SNR value when oc r =l, SNR r is the estimate of signal-to-noise 
ratio of the processed r-th frame. Based on the formula (3), it is known that a r 
5 is inversely proportional to SNR r . The smaller the SNR r is, the larger the ct r 
is, and a larger a r is helpful in removing the larger noise spectrum excursion. 
H Examining human speech spectrum, it is known that the speech energy 

S distributes non-uniformly and often concentrates on lower frequency 

II.) 

jjj components. Hence SNR differs with frequencies and often have larger values at 
JlO lower frequency components. From the formula (3), it is known that more 
G) suppression is needed for lower SNR and vise versa. High-frequency 
J3 components thus need more suppression to avoid musical noise, while 
~ ; low-frequency components need less suppression to prevent speech distortion. 
However, for the over-subtraction method based on formulas (2) and (3), it faces 
15 the problem of too much over-subtraction and hence speech distortion at 
low-frequency components while too less over-subtraction and hence musical 
noise at high-frequency components. Accordingly, improved schemes are 
proposed to avoid such a problem, and one of the schemes can be found in 
Kuo-Guan Wu and Po-Cheng Chen "Efficient speech enhancement using 
20 spectral subtraction for car hands-free application". 2001 Digest of technical 
papers, pp. 220-221, which is incorporated herein for reference. However, it 
is unable to completely eliminate the problem. Therefore, there is a need 
for the above conventional noise reduction method to be improved. 
SUMMARY OF THE INVENTION 



The object of the present invention is to provide a noise reduction 
method capable of effectively eliminating the musical noise and reducing 
speech distortion. 

To achieve the object, the noise reduction method divides input noise 

5 speech into a plurality of continuous frames, determines noisy speech spectrum 
for each frame, and partitions frequency band into multiple sub-bands to 
determine clean speech spectrum from the noisy speech spectrum on each 
sub-band. The method is provided to first estimate noise spectrum of r-th frame 
at k-th frequency component from the noisy speech of r-th frame by silence 

10 detection and noise spectrum estimation. Next, the signal-to-noise ratio (SNR) 
value of i-th sub-band for r-th frame is estimated. Then, an over-subtraction 
factor of sub-band i is determined based on the estimated sub-band SNR. Finally, 
the clean speech spectrum estimate is determined by performing a spectral 
subtraction on each sub-band. 

15 Other objects, advantages, and novel features of the invention will 

become more apparent from the following detailed description when taken 
in conjunction with the accompanying drawings. 
BRTEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is the flowchart of a conventional spectral subtraction method. 

20 FIG. 2 is the flowchart of the noise reduction method in accordance 

with the present invention. 

DETAILED DESCRIP TION OF THE PREFERRED EMBODIMENT 

With reference to FIG. 2, there is shown the flowchart of a preferred 
embodiment of the noise reduction method in accordance with the present 
25 invention. As shown, the input noisy speech of the r-th frame 



y r (k) = s r (k) + w r (k) is processes by FFT (fast Fourier Transform) (step S201) 
to obtain its energy spectrum |Y r (k)| 2 . The noisy speech y r (k) is also processed 
by silence detection (step S202) and noise spectrum estimation (step S203) to 
estimate the noise spectrum of the r-th frame, denoted as |W r (k)| 2 . 
5 For the noisy speech spectrum |Y r (k)| 2 and noise spectrum |W r (k)| 2 5 

the method of the present invention utilizes a sub-band over-subtraction 

mechanism to determine the estimate of clean speech spectrum |s r (k)| , 

s 

§ which is then processed by IFFT (Inverse Fast Fourier Transform) (step S207) 



for being restored to enhanced frame signal s r (k) . The method of the present 



on 
\i 
m 

M 

j; 10 invention partitions the frequency band into multiple sub-bands and perform 

CI over-subtraction on each sub-band. To implement over-subtraction on each 

d sub-band, it is first performed a sub-band SNR estimation (step S204) to 

Jfj estimate a SNR value for determining the over- subtraction factor of the sub-band. 
The SNR value can be obtained by a regression formula as follows: 



SNR r (i) = u ■ SNR°_! (i) + (1 - u) ■ 10 • logj 



l|Y r (i,k)| 2 



Z|w r (i,k)| 2 

V kesub-band i 



where i is the index of sub-band, SNR r (i) is SNR estimate of the i-th sub-band 
for the r-th frame, |Y r (i,k)J 2 is noisy speech spectrum of the r-th. frame at the 
k-the frequency component of the i-th sub-band, |W r (i,k)| 2 is the corresponding 
noise spectrum, 0 < p. < 1 , and SNR°_i (i) is the SNR of the sub-band for the 
20 previous frame after noise reduction, which is expressed by the following 
formula: 
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2|w M (i,k)| 2 

kesub-band i 



where |S r _ 1 (k,i)| is the estimate of clean speech spectrum of the previous, i.e., 

the (r-l)-th, frame after being processed in the sub-band i. 

In step S205, the sub-band over-subtraction factor ct r (i) is determined 
based on the estimated sub-band SNR value SNR r (i) , and is expressed by the 
formula as follows: 

a r (i) = a 0 (i) + SNR r (i) • 1 ~ a ° (0 , 
rW ow r SNRj(i) 

where ot 0 (i) is pre-selected over-subtraction factor when the actual SNR r (i) =0 
at sub-band i, and SNRj(i) represents pre-selected SNR value when a r (i)=l. 

Once determining the over-subtraction factor a r (i) for each sub-band i, it is 
able to perform spectral over-subtraction on each sub-band i (step S206), as 
expressed by the following formula: 

|s r (i,k)| 2 =|Y r (i,k)| 2 -a r (i)-|W r (i,k)| 2 , 
h . |2 

wherein the determined |S r (i,k)| is the clean speech spectrum at sub-band i for 

the r-th frame. After performing over-subtraction for each sub-band i, the IFFT is 
applied (step S207) to obtain the estimated enhanced frame signal s r (k) . 

In executing the aforementioned method, due to the small number of 
frequency samples in the lower bands, there will be large variation in sub-band 
SNR estimate when the noise is strong, which may cause an error in a r (i) and 



influence the quality of the restored speech. To avoid such a problem, in step 
S205, the SNR value SNR r of the whole frame is incorporated into 
modification of sub-band over-subtraction factors as follows: 
a r (i) = amax if SNR r <SNR min , 
5 where SNR min is pre-selected minimum value of SNR. 

Furthermore, in this embodiment, the step S204 employs regression 
scheme to estimate the SNR value for determining the over-subtraction factor 
§4 of the sub-band. However, in practical application, the SNR value of sub-band 
O can also be determined by other known speech signal SNR estimation methods, 
SlO for example, the high order statistic method described in Elias Nemer, Rafik 

m 

=1 Goubran and Samy Mahmoud: 'SNR estimation of speech signals using 

h\ subbands and fourth-order statistics', IEEE Signal Processing Letters, 1999, vol. 

CtJ 

S 6, no. 7, pp. 171-174, which is incorporated herein for reference. 

Si 

d To verify the effect of the present noise reduction method, noisy 

m 

1 5 speech data is generated by adding clean speech data with white Gaussian 
noise of variant magnitudes to form 3 segmental SNRs: 15dB, lOdB and 
5dB. Eight clean speech sentences are collected with 5 sentences from 
males and 3 from females. Table 1 compares the averaged segmental SNR 
improvements of conventional over-subtraction method (with parameters 

20 of cc 0 = 7.5 and SNRj - 20 ) and those of the present method (with 
parameters ofa 0 (1-1 8) = 2, SNRj (1-13) = 1.5, SNR^H- 18) = 1.25) with 
sub-band SNR obtained from clean speech data. 
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Tablel 



Method 
Input SNR 


Conventional 
over-subtraction 


Present 

sub-band 

over-subtraction 


inipiuvciiiciiL \jl 

the present method 


15dB 


2.39 


3.33 


39.3% 


lOdB 


3.86 


4.76 


23.3% 


5dB 


5.64 


6.64 


17.5% 



From this comparison, it is known that at 15dB input SNR, the present 
method has the potential of achieving 40% improvement over the 
conventional method. The potential improvements increase with input 
SNR. 



Table 2 compares the averaged segmental SNR improvements of 
conventional over-subtraction method (with parameters of cc 0 = 7.5 and 
SNR t = 20 ) and those of the present method (with parameters 
of ot 0 (l~18) = 2, u-0.25, SNR 1 (1~9) = 10, 

SNR 1 (10~13) = 15, SNR 1 (14~16) = 2,and SNR^-IS)^^) with 
sub-band SNR obtained from the step S204 of sub-band SNR estimation. 



Table 2 



^\Method 


Conventional 


Present 


Improvement of 




over-subtraction 


sub-band 


the present method 


Input SNR 




over-subtraction 




15dB 


2.39 


2.80 


17.0% 


lOdB 


3.86 


4.09 


6.0% 


5dB 


5.64 


5.96 


5.7% 



From Table 2, it is known that at input SNR=15dB, although the SNR 
value of sub-band is obtained by estimation, the present method still can 
achieve 17% improvement over the conventional method. 



Although the present invention has been explained in relation to its 
preferred embodiment, it is to be understood that many other possible 



modifications and variations can be made without departing from the spirit 
and scope of the invention as hereinafter claimed. 



